A combined method of optimized learning vector quantization and neuro-fuzzy techniques for predicting unified Parkinson's disease rating scale using vocal features

Parkinson's Disease (PD) is a common disorder of the central nervous system. The Unified Parkinson's Disease Rating Scale or UPDRS is commonly used to track PD symptom progression because it displays the presence and severity of symptoms. To model the relationship between speech signal properties and UPDRS scores, this study develops a new method using Neuro-Fuzzy (ANFIS) and Optimized Learning Rate Learning Vector Quantization (OLVQ1). ANFIS is developed for different Membership Functions (MFs). The method is evaluated using Parkinson's telemonitoring dataset which includes a total of 5875 voice recordings from 42 individuals in the early stages of PD which comprises 28 men and 14 women. The dataset is comprised of 16 vocal features and Motor-UPDRS, and Total-UPDRS. The method is compared with other learning techniques. The results show that OLVQ1 combined with the ANFIS has provided the best results in predicting Motor-UPDRS and Total-UPDRS. The lowest Root Mean Square Error (RMSE) values (UPDRS (Total)=0.5732; UPDRS (Motor)=0.5645) and highest R-squared values (UPDRS (Total)=0.9876; UPDRS (Motor)=0.9911) are obtained by this method. The results are discussed and directions for future studies are presented.i. ANFIS and OLVQ1 are combined to predict UPDRS.ii. OLVQ1 is used for PD data segmentation.iii. ANFIS is developed for different MFs to predict Motor-UPDRS and Total-UPDRS.


Parkinson's disease
Neuro-fuzzy Optimized learning rate Motor-UPDRS Total-UPDRS Learning vector quantization a b s t r a c t Parkinson's Disease (PD) is a common disorder of the central nervous system.The Unified Parkinson's Disease Rating Scale or UPDRS is commonly used to track PD symptom progression because it displays the presence and severity of symptoms.To model the relationship between speech signal properties and UPDRS scores, this study develops a new method using Neuro-Fuzzy (ANFIS) and Optimized Learning Rate Learning Vector Quantization (OLVQ1).ANFIS is developed for different Membership Functions (MFs).The method is evaluated using Parkinson's telemonitoring dataset which includes a total of 5875 voice recordings from 42 individuals in the early stages of PD which comprises 28 men and 14 women.The dataset is comprised of 16 vocal features and Motor-UPDRS, and Total-UPDRS.The method is compared with other learning techniques.The results show that OLVQ1 combined with the ANFIS has provided the best results in predicting Motor-UPDRS and Total-UPDRS.The lowest Root Mean Square Error (RMSE) values (UPDRS (Total) = 0.5732; UPDRS (Motor) = 0.5645) and highest R-squared values (UPDRS (Total) = 0.9876; UPDRS (Motor) = 0.9911) are obtained by this method.The results are discussed and directions for future studies are presented.i. ANFIS and OLVQ1 are combined to predict UPDRS.ii.OLVQ1 is used for PD data segmentation.iii.ANFIS is developed for different MFs to predict Motor-UPDRS and Total-UPDRS.

Specifications table
Subject area: Neuroscience More specific subject area: Parkinson's disease Name of your method: A Combined Method of Optimized Learning Vector Quantization and Neuro-Fuzzy Techniques Name and reference of original method:

Method details
Tracking the progression of Parkinson's Disease (PD) remotely permits patients to be monitored without their physical presence in the clinic.Patients typically collect data at home using monitoring devices, which are then transmitted to the clinic via telephone or internet connections.The use of remote tracking techniques offers a promising solution for the management of a growing patient population, especially in situations where geographical constraints or limited resources make traditional clinic-based care challenging.The UPDRS is commonly used to track PD symptom progression because it displays the presence and severity of symptoms.It has been suggested to track the progression of PD symptoms by linking measures of PD dysphonia to the Motor-UPDRS and Total-UPDRS [ 1 , 2 ].Machine learning algorithms have the potential to assist physicians in both diagnosing Parkinson's disease and quantifying its progression by extracting valuable patterns from processed data [3] .To model the relationship between speech signal properties and UPDRS scores, various machine learning techniques have been employed such as Support Vector Machines (SVMs) [4][5][6] , Adaptive Neuro-Fuzzy Inference System [ 7 , 8 ], Support Vector Regression (SVR) [9] , Neural Networks [10][11][12][13][14][15][16][17][18] , and Gaussian Process Regression [19] .
In contrast with the previous method for PD diagnosis which relies solely on supervised learning techniques, this study develops a new method using Adaptive Neuro-Fuzzy Inference System (ANFIS) and Optimized Learning Rate Learning Vector Quantization (OLVQ1).ANFIS models are developed for different Membership Functions (MFs) with a hybrid learning algorithm.The method is evaluated using Parkinson's telemonitoring dataset which includes a total of 5875 voice recordings from 42 individuals in the early stages of PD which comprises 28 men and 14 women.The dataset is comprised of 16 vocal features and Motor-UPDRS, and Total-UPDRS.The method is compared with the Support Vector Regression (SVR), ANFIS, Gaussian Process Regression (GPR) and the combination of OLVQ1 with ANFIS for different Triangular MF, Trapezoidal MF, Generalized Bell MF, and Gaussian MF.
To model the relationship between speech signal properties and UPDRS scores, this study develops a new method using ANFIS and OLVQ1.ANFIS is developed for different MFs.These techniques are introduced in the following sections.

LVQ
LVQ is an algorithm for supervised competitive neural network learning [20] .The LVQ network is illustrated in Fig. 1  given to the LVQ network.  are the presented training instances .The LVQ algorithm initializes   ,  = 1 , … ,  , by random selection of  instances from the dataset in training set  .In each iteration of the network training, the position of a prototype   is adjusted based on its distances to   .If a prototype   and the input sample   belong to the same class, the prototype moves towards   .On the other hand, if they belong to different classes, the prototype moves in the opposite direction.This process of updating the prototype locations continues iteratively.
In the classification stage, an instance  is given the label of the class corresponding to its nearest prototype   * , where the nearest prototype can be defined in Eq. ( 1) as: LVQ1 is the first developed LVQ network.In LVQ1, during each iteration  and for each example   , the first step involves calculating the distance between training instances   and all prototypes   .
Accordingly, we can define the index of the winning prototype   * as: Then, we have: If   * and   share the same class, the winning neuron is adjusted towards   .Conversely, if they belong to different classes, the winning neuron is pushed away.The adjustment of the winning neuron's position is influenced by the global learning rate η(  ) , which can either remain constant or decrease over time  , with values ranging from 0 to 1.
Optimized learning rate LVQ1 or OLVQ1 is an enhanced variation of LVQ1 that incorporates individual learning rates η  (  ) for each prototype   (  ) in the learning rule, rather than utilizing a global learning rate (t).OLVQ1 aims to expedite the convergence process.The local learning rate η  (  ) is defined as: The initial learning rate, denoted as η  (0) , is used as the starting point for each prototype's learning rate.The value of  (  ) is determined based on the class membership of   and  , with  (  ) equal to 1 if they belong to the same class, and  (  ) equal to -1 otherwise.It is important to note that the learning rate η  (  ) has the potential to increase.To prevent uncontrolled growth, an upper bound η max , which falls within the range of 0 to 1, is defined for each η  (  ) .

ANFIS
In this study, the Adaptive Neuro-Fuzzy Inference System (ANFIS) [21] is employed to predict the Total-and Motor-UPDRS using a set of speech signals (dysphonia measures).ANFIS combines fuzzy logic and neural network methodologies and is commonly utilized in prediction tasks, particularly in the domain of tourism and hospitality.By establishing mappings between input and output variables, ANFIS generates optimal membership functions that enable accurate predictions based on a set of fuzzy rules.ANFIS offers various types of Membership Functions (MF), including Triangular MF, Trapezoidal MF, Generalized Bell MF, and Gaussian MF.This research employs all of these MFs in ANFIS modeling to predict the UPDRS score.ANFIS is structured into five distinct layers which is illustrated in Fig. 2 .

Data analysis and results
The Parkinson's telemonitoring dataset was developed through a collaboration between Athanasios Tsanas and Max Little from the University of Oxford, along with 10 medical centers in the US and Intel Corporation.It was designed to work in conjunction with the AHTD telemonitoring device, specifically created for recording speech signals from individuals with Parkinson's disease (PD) [1] .This dataset became available on the UCI Machine Learning Archive in October 2009.The dataset includes recordings from 42 individuals in the early stages of PD which comprises 28 men and 14 women.There were a total of 5875 voice recordings because each patient contributed approximately 200 voice recordings, making the total number of voice recordings 5875.The recordings were made with the patients maintaining the vowel sound /a/ while the recordings were being made.The dataset is comprised of 26 attributes, which include a variety of information such as the subject's number, age, gender, time interval from baseline recruitment data, motor-UPDRS, total-UPDRS, and 16 biomedical voice measures, also known as vocal features.In addition, the Parkinson's telemonitoring dataset includes 16 vocal features (see Table 1 ).The vocal characteristics include a wide variety of measurements such as jitter, shimmer, HNR, and NHR.
The scores on the Motor-UPDRS and the Total-UPDRS (as two outputs of the dataset) have been evaluated at the beginning of the trial, after three months, and after six months of treatment.Voice recordings, on the other hand, were collected on a weekly basis.The Motor-UPDRS scores and the total-UPDRS scores were linearly interpolated so that we could ensure that our data were consistent.The baseline, three-month, and six-month UPDRS scores are presented in Table 1 of the original research publication.Additionally, corresponding feature labels and concise explanations for each measurement are included in this table.Furthermore, some fundamental statistics regarding the dataset are provided in Table 1 .This dataset has been widely used by researchers [ 1 , 22-24 ] in the field of Parkinson's disease to develop algorithms for the early detection and monitoring of PD symptoms based on vocal characteristics.
The data were clustered using LVQ.The results of data clustering are shown in Table 2 .Nine clusters were generated from the Parkinson's telemonitoring dataset.The clusters are visualized in Fig. 3 using different principal components generated by principal components analysis.

Method evaluation
The experiment was conducted using Microsoft Windows 10 Pro on a laptop equipped with an Intel(R) Core(TM) i7-6700HQ CPU running at 2.60 GHz, with four cores and eight logical processors.To prevent overfitting, a 10-fold cross-validation approach was employed during the training of the LVQ and ANFIS models.The method is evaluated using two metrics: RMSE and R 2 .The formulas for these metrics are presented in Eqs. ( 6) and (7) .
where  is the number of instances in the LVQ cluster,    denotes the Total-and Motor-UPDRS, P    i denotes the predicted Total-and Motor-UPDRS,   i is the mean value of   .
The data was divided into 10 equal parts, where nine parts were used for training the model and the remaining part was used for testing.For example, the RMSE was calculated for each fold.This process was repeated for all ten folds.By averaging the RMSE values across all folds, an estimate of the model's overall performance was obtained.The nine models were evaluated based on their RMSE and correlation coefficients.A higher value of R 2 indicates a better fit of the model.Conversely, lower values of RMSE indicate superior performance by the predictor.ANFIS was performed on the clusters to construct the prediction models.Different membership functions were used in ANFIS (i.e., Triangular MF, Trapezoidal MF, Generalized Bell MF, and Gaussian MF).An example of Gaussian MF is presented in Fig. 4 .For each variable three membership functions were considered.The RMSE and R-squared values were obtained for each model and the average values were calculated for methods comparisons.In Fig. 5 , we present the training times in 200 epochs for different MFs in all clusters.The 3D visualization of some relationships between inputs and outputs in ANFIS models are showing in Fig. 6 .The results of the method evaluation are presented in Table 3 .We present the results for Motor-UPDRS and the Total-UPDRS for RMSE and R 2 .We perform the method evaluation for different methods, SVR, ANFIS, Gaussian Process Regression (GPR) and the combination of OLVQ1 with ANFIS for different Triangular MF, Trapezoidal MF, Generalized Bell MF, and Gaussian MF.The results show that OLVQ1 combined with the ANFIS has provided the best results in predicting Motor-UPDRS and Total-UPDRS.In addition, in relation to Trapezoidal MF, Generalized Bell MF, and Triangular MF, Gaussian MF provides the best results.The lowest RMSE values (UPDRS (Total) = 0.5732; UPDRS (Motor) = 0.5645) and highest R-squared values (UPDRS (Total) = 0.9876; UPDRS (Motor) = 0.9911) are obtained by this method.This evaluation was also performed for the LVQ1 + ANFIS method which used Gaussian MF.The results  are close to the results of OLVQ1 + ANFIS with Triangular MF.Furthermore, when comparing the results of ANFIS and OLVQ1-ANFIS methods, there is a significant difference between the obtained accuracies, indicating that the use of OLVQ1 as a clustering technique is able to improve the efficiency of the ANFIS models in predicting Motor-UPDRS and the Total-UPDRS.
The outcome of our evaluation on the dataset also demonstrated that the method which used GPR has performed better predictions for the Total-UPDRS and Motor-UPDRS.Overall, it is concluded that the optimized learning rate LVQ1 has a significant advantage compared to the LVQ1 combined with ANFIS in predicting UPDRS for tracking PD progression.Note that, the RBF (Radial Basis Function) kernel was used in the SVR method.In addition, GPR used a squared exponential kernel for constructing the prediction models.ANFIS was trained for 200 epochs and with the use of a hybrid learning approach for all models.

Conclusion
Parkinson's disease is a disorder that affects the central nervous system that, over time, reduces a person's mobility and negatively impacts their overall quality of life.The diagnosis of PD at an early stage is of the utmost significance since it permits rapid medical intervention.The method developed by machine learning plays a critical part in this process.They help the creation of diagnostic instruments that are non-invasive and cost-effective.For PD detection, machine learning algorithms are able to build reliable predictive models because they can analyze a wide variety of data kinds such as medical records, brain scans, and voice samples.These models provide assistance to medical professionals in spotting minor shifts in symptoms, which in turn makes it easier to initiate early intervention and develop individualized treatment programs.This research has aimed to develop a new method based on machine learning techniques for PD diagnosis.The method was developed using OLVQ1 and ANFIS machine learning techniques and evaluated using the Parkinson's telemonitoring dataset.Using LVQ, nine clusters were detected from the PD data.The ANFIS models were constructed on each cluster of LVQ to predict Motor-UPDRS and the Total-UPDRS.We performed several comparisons between this method and the LVQ1 + ANFIS, SVR, ANFIS, and GPR, as well as the combination of OLVQ1 and ANFIS.According to the findings, the combination of the OLVQ1 and the ANFIS yielded the best results in predicting the Motor-UPDRS and the Total-UPDRS.In addition, the Gaussian MF obtained the best results with the smallest RMSE values (UPDRS (Total) = 0.5732; UPDRS (Motor) = 0.5645) and the highest R-squared values (UPDRS (Total) = 0.9876; UPDRS (Motor) = 0.9911) compared to the other MFs.This work includes several limitations which can be taken into account in developing new methods for PD diagnosis.First, this study has developed the method without the use of feature selection methods.They can be effective in investigating the relationship between vocal features and Motor-UPDRS and Total-UPDRS.In addition, feature selection can be an important phase of developing ANFIS models as when the number of features increases, there may be difficulty in the appropriate construction of prediction models by ANFIS.Second, ANFIS can be extended for incremental learning which can significantly increase the efficacy of the proposed method.Third, our method

Fig. 5 .
Fig. 5. Training times in 200 epochs for different MFs in all clusters.

Table 1
Parkinson's telemonitoring dataset for method evaluation.

Table 3
Method evaluation.